Section 5 - The third of five verbs: filter
Logical operators
R comes with a set of logical operators that you can use inside filter():
x < y,TRUEifxis less thany
x <= y, TRUEifxis less than or equal toyx == y, TRUEifxequalsy
x != y, TRUEifxdoes not equalyx >= y, TRUEifxis greater than or equal toy
x > y, TRUEifxis greater thanyx %in% c(a, b, c),TRUEifxis in the vectorc(a, b, c)`
The following example filters df such that only the observations for which a is positive, are kept:
filter(df, a > 0)
# Load the hflights package
library(hflights)
hflights_df <- hflights[sample(nrow(hflights), 720), ]
hflights <- as_tibble(hflights)
# All flights that traveled 3000 miles or more
filter(hflights, Distance >= 3000) %>% glimpse()## Observations: 527
## Variables: 21
## $ Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20...
## $ Month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ DayofMonth <int> 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, ...
## $ DayOfWeek <int> 1, 7, 6, 5, 4, 3, 2, 1, 7, 6, 5, 4, 3, 2, 1,...
## $ DepTime <int> 924, 925, 1045, 1516, 950, 944, 924, 1144, 9...
## $ ArrTime <int> 1413, 1410, 1445, 1916, 1344, 1350, 1337, 16...
## $ UniqueCarrier <chr> "CO", "CO", "CO", "CO", "CO", "CO", "CO", "C...
## $ FlightNum <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ TailNum <chr> "N69063", "N76064", "N69063", "N77066", "N76...
## $ ActualElapsedTime <int> 529, 525, 480, 480, 474, 486, 493, 501, 489,...
## $ AirTime <int> 492, 493, 459, 463, 455, 471, 473, 464, 466,...
## $ ArrDelay <int> 23, 20, 55, 326, -6, 0, -13, 135, -15, -10, ...
## $ DepDelay <int> -1, 0, 80, 351, 25, 19, -1, 139, 1, 17, 3, 1...
## $ Origin <chr> "IAH", "IAH", "IAH", "IAH", "IAH", "IAH", "I...
## $ Dest <chr> "HNL", "HNL", "HNL", "HNL", "HNL", "HNL", "H...
## $ Distance <int> 3904, 3904, 3904, 3904, 3904, 3904, 3904, 39...
## $ TaxiIn <int> 6, 13, 4, 7, 4, 5, 5, 7, 6, 3, 6, 4, 6, 4, 5...
## $ TaxiOut <int> 31, 19, 17, 10, 15, 10, 15, 30, 17, 10, 19, ...
## $ Cancelled <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CancellationCode <chr> "", "", "", "", "", "", "", "", "", "", "", ...
## $ Diverted <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,...
# All flights flown by one of JetBlue, Southwest, or Delta
filter(hflights, UniqueCarrier %in% c("JetBlue", "Southwest", "Delta")) %>% glimpse()## Observations: 0
## Variables: 21
## $ Year <int>
## $ Month <int>
## $ DayofMonth <int>
## $ DayOfWeek <int>
## $ DepTime <int>
## $ ArrTime <int>
## $ UniqueCarrier <chr>
## $ FlightNum <int>
## $ TailNum <chr>
## $ ActualElapsedTime <int>
## $ AirTime <int>
## $ ArrDelay <int>
## $ DepDelay <int>
## $ Origin <chr>
## $ Dest <chr>
## $ Distance <int>
## $ TaxiIn <int>
## $ TaxiOut <int>
## $ Cancelled <int>
## $ CancellationCode <chr>
## $ Diverted <int>
# All flights where taxiing took longer than flying
filter(hflights, (TaxiIn + TaxiOut) > AirTime) %>% glimpse()## Observations: 1,389
## Variables: 21
## $ Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20...
## $ Month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ DayofMonth <int> 24, 30, 24, 10, 31, 31, 31, 31, 30, 30, 30, ...
## $ DayOfWeek <int> 1, 7, 1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7, 7,...
## $ DepTime <int> 731, 1959, 1621, 941, 1301, 2113, 1434, 900,...
## $ ArrTime <int> 904, 2132, 1749, 1113, 1356, 2215, 1539, 100...
## $ UniqueCarrier <chr> "AA", "AA", "AA", "AA", "CO", "CO", "CO", "C...
## $ FlightNum <int> 460, 533, 1121, 1436, 241, 1533, 1541, 1583,...
## $ TailNum <chr> "N545AA", "N455AA", "N484AA", "N591AA", "N14...
## $ ActualElapsedTime <int> 93, 93, 88, 92, 55, 62, 65, 66, 64, 84, 80, ...
## $ AirTime <int> 42, 43, 43, 45, 27, 30, 30, 32, 31, 40, 37, ...
## $ ArrDelay <int> 29, 12, 4, 48, -2, 20, 15, 10, 10, 54, 16, 1...
## $ DepDelay <int> 11, -6, -9, 31, -4, 13, 4, 0, -1, 39, 2, -4,...
## $ Origin <chr> "IAH", "IAH", "IAH", "IAH", "IAH", "IAH", "I...
## $ Dest <chr> "DFW", "DFW", "DFW", "DFW", "AUS", "AUS", "A...
## $ Distance <int> 224, 224, 224, 224, 140, 140, 140, 140, 140,...
## $ TaxiIn <int> 14, 10, 10, 27, 5, 7, 5, 5, 6, 10, 6, 4, 6, ...
## $ TaxiOut <int> 37, 40, 35, 20, 23, 25, 30, 29, 27, 34, 37, ...
## $ Cancelled <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CancellationCode <chr> "", "", "", "", "", "", "", "", "", "", "", ...
## $ Diverted <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
Combining tests using boolean operators
R also comes with a set of boolean operators that you can use to combine multiple logical tests into a single test. These include & (and), | (or), and ! (not). Instead of using the & operator, you can also pass several logical tests to filter(), separated by commas. The following two calls are completely equivalent:
filter(df, a > 0 & b > 0)
filter(df, a > 0, b > 0)
Next, is.na() will also come in handy. This example keeps the observations in df for which the variable x is not NA:
filter(df, !is.na(x))
# All flights that departed before 5am or arrived after 10pm
filter(hflights, DepTime < 500 | ArrTime > 2200) %>% glimpse()## Observations: 27,799
## Variables: 21
## $ Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20...
## $ Month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ DayofMonth <int> 4, 14, 10, 26, 30, 9, 31, 31, 31, 31, 31, 31...
## $ DayOfWeek <int> 2, 5, 1, 3, 7, 7, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ DepTime <int> 2100, 2119, 1934, 1905, 1856, 1938, 1919, 21...
## $ ArrTime <int> 2207, 2229, 2235, 2211, 2209, 2228, 2231, 23...
## $ UniqueCarrier <chr> "AA", "AA", "AA", "AA", "AA", "AS", "CO", "C...
## $ FlightNum <int> 533, 533, 1294, 1294, 1294, 731, 190, 209, 2...
## $ TailNum <chr> "N4XGAA", "N549AA", "N3BXAA", "N3BXAA", "N3C...
## $ ActualElapsedTime <int> 67, 70, 121, 126, 133, 290, 132, 268, 141, 1...
## $ AirTime <int> 42, 45, 107, 111, 108, 253, 107, 256, 121, 1...
## $ ArrDelay <int> 47, 69, 80, 56, 54, 78, -12, -15, -18, -10, ...
## $ DepDelay <int> 55, 74, 99, 70, 61, 73, -1, -7, 0, 8, -1, 5,...
## $ Origin <chr> "IAH", "IAH", "IAH", "IAH", "IAH", "IAH", "I...
## $ Dest <chr> "DFW", "DFW", "MIA", "MIA", "MIA", "SEA", "M...
## $ Distance <int> 224, 224, 964, 964, 964, 1874, 964, 1825, 10...
## $ TaxiIn <int> 3, 5, 3, 5, 7, 5, 5, 4, 5, 6, 4, 18, 4, 7, 9...
## $ TaxiOut <int> 22, 20, 11, 10, 18, 32, 20, 8, 15, 9, 18, 22...
## $ Cancelled <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CancellationCode <chr> "", "", "", "", "", "", "", "", "", "", "", ...
## $ Diverted <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
# All flights that departed late but arrived ahead of schedule
filter(hflights, DepDelay > 0, ArrDelay < 0) %>% glimpse()## Observations: 27,712
## Variables: 21
## $ Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20...
## $ Month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ DayofMonth <int> 2, 5, 18, 18, 12, 13, 26, 1, 10, 12, 15, 17,...
## $ DayOfWeek <int> 7, 3, 2, 2, 3, 4, 3, 6, 1, 3, 6, 1, 4, 7, 6,...
## $ DepTime <int> 1401, 1405, 1408, 721, 2015, 2020, 2009, 163...
## $ ArrTime <int> 1501, 1507, 1508, 827, 2113, 2116, 2103, 173...
## $ UniqueCarrier <chr> "AA", "AA", "AA", "AA", "AA", "AA", "AA", "A...
## $ FlightNum <int> 428, 428, 428, 460, 533, 533, 533, 1121, 112...
## $ TailNum <chr> "N557AA", "N492AA", "N507AA", "N558AA", "N55...
## $ ActualElapsedTime <int> 60, 62, 60, 66, 58, 56, 54, 65, 61, 68, 64, ...
## $ AirTime <int> 45, 44, 42, 46, 39, 44, 39, 37, 41, 44, 48, ...
## $ ArrDelay <int> -9, -3, -2, -8, -7, -4, -17, -9, -5, -6, -9,...
## $ DepDelay <int> 1, 5, 8, 1, 10, 15, 4, 1, 9, 1, 2, 2, 4, 5, ...
## $ Origin <chr> "IAH", "IAH", "IAH", "IAH", "IAH", "IAH", "I...
## $ Dest <chr> "DFW", "DFW", "DFW", "DFW", "DFW", "DFW", "D...
## $ Distance <int> 224, 224, 224, 224, 224, 224, 224, 224, 224,...
## $ TaxiIn <int> 6, 9, 7, 7, 9, 4, 9, 16, 8, 5, 5, 10, 10, 9,...
## $ TaxiOut <int> 9, 9, 11, 13, 10, 8, 6, 12, 12, 19, 11, 11, ...
## $ Cancelled <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CancellationCode <chr> "", "", "", "", "", "", "", "", "", "", "", ...
## $ Diverted <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
# All flights that were cancelled after being delayed
filter(hflights, DepDelay > 0, Cancelled == 1) %>% glimpse()## Observations: 40
## Variables: 21
## $ Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20...
## $ Month <int> 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 4, 4,...
## $ DayofMonth <int> 26, 11, 19, 7, 4, 8, 2, 9, 1, 31, 4, 8, 21, ...
## $ DayOfWeek <int> 3, 2, 3, 5, 5, 2, 3, 3, 2, 4, 1, 5, 4, 1, 1,...
## $ DepTime <int> 1926, 1100, 1811, 2028, 1638, 1057, 802, 904...
## $ ArrTime <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ UniqueCarrier <chr> "CO", "US", "XE", "XE", "AA", "CO", "XE", "X...
## $ FlightNum <int> 310, 944, 2376, 3050, 1121, 408, 2189, 2605,...
## $ TailNum <chr> "N77865", "N452UW", "N15932", "N15912", "N53...
## $ ActualElapsedTime <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ AirTime <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ ArrDelay <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ DepDelay <int> 26, 135, 6, 73, 8, 187, 2, 4, 28, 156, 42, 5...
## $ Origin <chr> "IAH", "IAH", "IAH", "IAH", "IAH", "IAH", "I...
## $ Dest <chr> "EWR", "CLT", "ICT", "JAX", "DFW", "EWR", "D...
## $ Distance <int> 1400, 913, 542, 817, 224, 1400, 217, 217, 68...
## $ TaxiIn <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ TaxiOut <int> NA, NA, NA, 19, 19, NA, NA, NA, 19, NA, NA, ...
## $ Cancelled <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ CancellationCode <chr> "B", "B", "B", "A", "A", "A", "B", "B", "A",...
## $ Diverted <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
Blend together what you’ve learned!
So far, you have learned three data manipulation functions in the dplyr package. Time for a summarizing exercise. You will generate a new dataset from the `hflights dataset that contains some useful information on flights that hadJFK airportas their destination. You will needselect(),mutate()andfilter()`.
# Select the flights that had JFK as their destination: c1
c1 <- filter(hflights, Dest == "JFK")
# Combine the Year, Month and DayofMonth variables to create a Date column: c2
c2 <- mutate(c1, Date = paste(Year, Month, DayofMonth, sep='-'))
# Print out a selection of columns of c2
select(c2, Date, DepTime, ArrTime, TailNum)## # A tibble: 695 x 4
## Date DepTime ArrTime TailNum
## <chr> <int> <int> <chr>
## 1 2011-1-1 654 1124 N324JB
## 2 2011-1-1 1639 2110 N324JB
## 3 2011-1-2 703 1113 N324JB
## 4 2011-1-2 1604 2040 N324JB
## 5 2011-1-3 659 1100 N229JB
## 6 2011-1-3 1801 2200 N206JB
## 7 2011-1-4 654 1103 N267JB
## 8 2011-1-4 1608 2034 N267JB
## 9 2011-1-5 700 1103 N708JB
## 10 2011-1-5 1544 1954 N644JB
## # ... with 685 more rows
Recap on select, mutate and filter
With select(), mutate() and filter(), you can already reveal interesting information from a dataset. Through a combination of these expressions or by the use of a one-liner, try to answer the following question:
How many weekend flights flew a distance of more than 1000 miles but had a total taxiing time below 15 minutes?
filter(hflights, DayOfWeek %in% c(6,7), Distance > 1000, (TaxiIn + TaxiOut) < 15) %>% glimpse()## Observations: 1,739
## Variables: 21
## $ Year <int> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 20...
## $ Month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ DayofMonth <int> 23, 30, 30, 29, 23, 23, 23, 22, 16, 16, 16, ...
## $ DayOfWeek <int> 7, 7, 7, 6, 7, 7, 7, 6, 7, 7, 7, 7, 7, 7, 7,...
## $ DepTime <int> 1535, 851, 2234, 1220, 847, 1224, 931, 942, ...
## $ ArrTime <int> 1933, 1230, 2, 1353, 1213, 1345, 1045, 1340,...
## $ UniqueCarrier <chr> "B6", "CO", "CO", "CO", "CO", "CO", "CO", "C...
## $ FlightNum <int> 624, 1058, 1717, 1620, 1058, 1629, 1723, 1, ...
## $ TailNum <chr> "N599JB", "N39726", "N38417", "N87512", "N16...
## $ ActualElapsedTime <int> 178, 159, 208, 153, 146, 201, 194, 478, 288,...
## $ AirTime <int> 164, 145, 195, 139, 134, 188, 181, 465, 275,...
## $ ArrDelay <int> -27, -13, 89, 19, -30, -27, -28, -10, 12, -1...
## $ DepDelay <int> 0, -2, 94, 45, -6, -1, -5, 17, -2, -5, 3, -1...
## $ Origin <chr> "HOU", "IAH", "IAH", "IAH", "IAH", "IAH", "I...
## $ Dest <chr> "JFK", "DCA", "SAN", "PHX", "DCA", "SNA", "O...
## $ Distance <int> 1428, 1208, 1303, 1009, 1208, 1347, 1334, 39...
## $ TaxiIn <int> 6, 3, 3, 5, 4, 4, 3, 3, 5, 3, 3, 4, 3, 6, 4,...
## $ TaxiOut <int> 8, 11, 10, 9, 8, 9, 10, 10, 8, 10, 9, 10, 11...
## $ Cancelled <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CancellationCode <chr> "", "", "", "", "", "", "", "", "", "", "", ...
## $ Diverted <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
answer: 155 flights !
Section 6 - Almost there: the arrange verb
Arranging your data
arrange() can be used to rearrange rows according to any type of data. If you pass arrange() a character variable, for example, R will rearrange the rows in alphabetical order according to values of the variable. If you pass a factor variable, R will rearrange the rows according to the order of the levels in your factor (running levels() on the variable reveals this order).
dtc has already been defined on the right. It’s up to you to write some arrange() expressions to display its contents appropriately!
# Definition of dtc
dtc <- filter(hflights, Cancelled == 1, !is.na(DepDelay))
# Arrange dtc by departure delays
arrange(dtc, DepDelay) %>%
head(10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| Year | Month | DayofMonth | DayOfWeek | DepTime | ArrTime | UniqueCarrier | FlightNum | TailNum | ActualElapsedTime | AirTime | ArrDelay | DepDelay | Origin | Dest | Distance | TaxiIn | TaxiOut | Cancelled | CancellationCode | Diverted |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 | 7 | 23 | 6 | 605 | NA | F9 | 225 | N912FR | NA | NA | NA | -10 | HOU | DEN | 883 | NA | 10 | 1 | A | 0 |
| 2011 | 1 | 17 | 1 | 916 | NA | XE | 3068 | N13936 | NA | NA | NA | -9 | IAH | HRL | 295 | NA | NA | 1 | B | 0 |
| 2011 | 12 | 1 | 4 | 541 | NA | US | 282 | N840AW | NA | NA | NA | -9 | IAH | PHX | 1009 | NA | NA | 1 | A | 0 |
| 2011 | 10 | 12 | 3 | 2022 | NA | MQ | 3724 | N539MQ | NA | NA | NA | -8 | IAH | LAX | 1379 | NA | NA | 1 | A | 0 |
| 2011 | 7 | 29 | 5 | 1424 | NA | CO | 1079 | N14628 | NA | NA | NA | -6 | IAH | ORD | 925 | NA | 13 | 1 | A | 0 |
| 2011 | 9 | 29 | 4 | 1639 | NA | OO | 2062 | N724SK | NA | NA | NA | -6 | IAH | ATL | 689 | NA | NA | 1 | B | 0 |
| 2011 | 2 | 9 | 3 | 555 | NA | MQ | 3265 | N613MQ | NA | NA | NA | -5 | HOU | DFW | 247 | NA | 11 | 1 | A | 0 |
| 2011 | 5 | 9 | 1 | 715 | NA | OO | 1177 | N758SK | NA | NA | NA | -5 | IAH | DTW | 1076 | NA | 17 | 1 | A | 0 |
| 2011 | 1 | 20 | 4 | 1413 | NA | UA | 552 | N509UA | NA | NA | NA | -4 | IAH | IAD | 1190 | NA | NA | 1 | A | 0 |
| 2011 | 1 | 17 | 1 | 831 | NA | WN | 1 | N714CB | NA | NA | NA | -4 | HOU | HRL | 276 | NA | 8 | 1 | B | 0 |
# Arrange dtc so that cancellation reasons are grouped
arrange(dtc, CancellationCode) %>%
head(10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| Year | Month | DayofMonth | DayOfWeek | DepTime | ArrTime | UniqueCarrier | FlightNum | TailNum | ActualElapsedTime | AirTime | ArrDelay | DepDelay | Origin | Dest | Distance | TaxiIn | TaxiOut | Cancelled | CancellationCode | Diverted |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 | 1 | 20 | 4 | 1413 | NA | UA | 552 | N509UA | NA | NA | NA | -4 | IAH | IAD | 1190 | NA | NA | 1 | A | 0 |
| 2011 | 1 | 7 | 5 | 2028 | NA | XE | 3050 | N15912 | NA | NA | NA | 73 | IAH | JAX | 817 | NA | 19 | 1 | A | 0 |
| 2011 | 2 | 4 | 5 | 1638 | NA | AA | 1121 | N537AA | NA | NA | NA | 8 | IAH | DFW | 224 | NA | 19 | 1 | A | 0 |
| 2011 | 2 | 8 | 2 | 1057 | NA | CO | 408 | N11641 | NA | NA | NA | 187 | IAH | EWR | 1400 | NA | NA | 1 | A | 0 |
| 2011 | 2 | 1 | 2 | 1508 | NA | OO | 5812 | N959SW | NA | NA | NA | 28 | IAH | ATL | 689 | NA | 19 | 1 | A | 0 |
| 2011 | 2 | 21 | 1 | 2257 | NA | OO | 1111 | N778SK | NA | NA | NA | -3 | IAH | AUS | 140 | NA | NA | 1 | A | 0 |
| 2011 | 2 | 9 | 3 | 555 | NA | MQ | 3265 | N613MQ | NA | NA | NA | -5 | HOU | DFW | 247 | NA | 11 | 1 | A | 0 |
| 2011 | 3 | 18 | 5 | 727 | NA | UA | 109 | N469UA | NA | NA | NA | -3 | IAH | DEN | 862 | NA | NA | 1 | A | 0 |
| 2011 | 4 | 4 | 1 | 1632 | NA | DL | 8 | N600TR | NA | NA | NA | 42 | IAH | ATL | 689 | NA | NA | 1 | A | 0 |
| 2011 | 4 | 8 | 5 | 1608 | NA | WN | 4 | N365SW | NA | NA | NA | 548 | HOU | DAL | 239 | NA | NA | 1 | A | 0 |
# Arrange dtc according to carrier and departure delays
arrange(dtc, UniqueCarrier, DepDelay) %>%
head(10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", , font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| Year | Month | DayofMonth | DayOfWeek | DepTime | ArrTime | UniqueCarrier | FlightNum | TailNum | ActualElapsedTime | AirTime | ArrDelay | DepDelay | Origin | Dest | Distance | TaxiIn | TaxiOut | Cancelled | CancellationCode | Diverted |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 | 8 | 18 | 4 | 1808 | NA | AA | 1294 | N3FLAA | NA | NA | NA | 3 | IAH | MIA | 964 | NA | NA | 1 | A | 0 |
| 2011 | 2 | 4 | 5 | 1638 | NA | AA | 1121 | N537AA | NA | NA | NA | 8 | IAH | DFW | 224 | NA | 19 | 1 | A | 0 |
| 2011 | 7 | 29 | 5 | 1424 | NA | CO | 1079 | N14628 | NA | NA | NA | -6 | IAH | ORD | 925 | NA | 13 | 1 | A | 0 |
| 2011 | 1 | 26 | 3 | 1703 | NA | CO | 410 | N77296 | NA | NA | NA | 0 | IAH | IAD | 1190 | NA | 13 | 1 | B | 0 |
| 2011 | 8 | 11 | 4 | 1320 | NA | CO | 1669 | N73275 | NA | NA | NA | 0 | IAH | MIA | 964 | NA | NA | 1 | A | 0 |
| 2011 | 7 | 25 | 1 | 1654 | NA | CO | 1422 | N58606 | NA | NA | NA | 24 | IAH | ATL | 689 | NA | NA | 1 | C | 0 |
| 2011 | 1 | 26 | 3 | 1926 | NA | CO | 310 | N77865 | NA | NA | NA | 26 | IAH | EWR | 1400 | NA | NA | 1 | B | 0 |
| 2011 | 3 | 31 | 4 | 1016 | NA | CO | 586 | N19136 | NA | NA | NA | 156 | IAH | MCO | 853 | NA | NA | 1 | B | 0 |
| 2011 | 2 | 8 | 2 | 1057 | NA | CO | 408 | N11641 | NA | NA | NA | 187 | IAH | EWR | 1400 | NA | NA | 1 | A | 0 |
| 2011 | 4 | 4 | 1 | 1632 | NA | DL | 8 | N600TR | NA | NA | NA | 42 | IAH | ATL | 689 | NA | NA | 1 | A | 0 |
Reverse the order of arranging
By default, arrange() arranges the rows from smallest to largest. Rows with the smallest value of the variable will appear at the top of the data set. You can reverse this behavior with the desc() function. arrange() will reorder the rows from largest to smallest values of a variable if you wrap the variable name in desc() before passing it to arrange().
# Arrange according to carrier and decreasing departure delays
arrange(hflights, UniqueCarrier, desc(DepDelay)) %>%
head(10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| Year | Month | DayofMonth | DayOfWeek | DepTime | ArrTime | UniqueCarrier | FlightNum | TailNum | ActualElapsedTime | AirTime | ArrDelay | DepDelay | Origin | Dest | Distance | TaxiIn | TaxiOut | Cancelled | CancellationCode | Diverted |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 | 12 | 12 | 1 | 650 | 808 | AA | 1740 | N473AA | 78 | 49 | 978 | 970 | IAH | DFW | 224 | 14 | 15 | 0 | 0 | |
| 2011 | 11 | 19 | 6 | 1752 | 1910 | AA | 1903 | N495AA | 78 | 40 | 685 | 677 | IAH | DFW | 224 | 7 | 31 | 0 | 0 | |
| 2011 | 12 | 22 | 4 | 1728 | 1848 | AA | 1903 | N580AA | 80 | 40 | 663 | 653 | IAH | DFW | 224 | 8 | 32 | 0 | 0 | |
| 2011 | 10 | 23 | 7 | 2305 | 2 | AA | 742 | N548AA | 57 | 39 | 507 | 525 | IAH | DFW | 224 | 5 | 13 | 0 | 0 | |
| 2011 | 9 | 27 | 2 | 1206 | 1300 | AA | 1948 | N4YUAA | 54 | 37 | 265 | 286 | IAH | DFW | 224 | 10 | 7 | 0 | 0 | |
| 2011 | 3 | 17 | 4 | 1647 | 1747 | AA | 1505 | N584AA | 60 | 41 | 262 | 277 | IAH | DFW | 224 | 7 | 12 | 0 | 0 | |
| 2011 | 6 | 21 | 2 | 955 | 1315 | AA | 466 | N3FTAA | 140 | 120 | 230 | 235 | IAH | MIA | 964 | 9 | 11 | 0 | 0 | |
| 2011 | 5 | 20 | 5 | 2359 | 130 | AA | 426 | N565AA | 91 | 70 | 255 | 234 | IAH | DFW | 224 | 8 | 13 | 0 | 0 | |
| 2011 | 4 | 19 | 2 | 2023 | 2142 | AA | 1925 | N467AA | 79 | 50 | 242 | 233 | IAH | DFW | 224 | 11 | 18 | 0 | 0 | |
| 2011 | 5 | 12 | 4 | 2133 | 53 | AA | 1294 | N3AYAA | 140 | 121 | 223 | 228 | IAH | MIA | 964 | 5 | 14 | 0 | 0 |
# Arrange flights by total delay (normal order).
arrange(hflights, (DepDelay + ArrDelay)) %>%
head(10) %>%
kable() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center", font_size = 11) %>%
row_spec(0, bold = T, color = "white", background = "#3f7689")| Year | Month | DayofMonth | DayOfWeek | DepTime | ArrTime | UniqueCarrier | FlightNum | TailNum | ActualElapsedTime | AirTime | ArrDelay | DepDelay | Origin | Dest | Distance | TaxiIn | TaxiOut | Cancelled | CancellationCode | Diverted |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 | 7 | 3 | 7 | 1914 | 2039 | XE | 2804 | N12157 | 85 | 66 | -70 | -1 | IAH | MEM | 468 | 4 | 15 | 0 | 0 | |
| 2011 | 8 | 31 | 3 | 934 | 1039 | OO | 2040 | N783SK | 185 | 172 | -56 | -11 | IAH | BFL | 1428 | 3 | 10 | 0 | 0 | |
| 2011 | 8 | 21 | 7 | 935 | 1039 | OO | 2001 | N767SK | 184 | 171 | -56 | -10 | IAH | BFL | 1428 | 3 | 10 | 0 | 0 | |
| 2011 | 8 | 28 | 7 | 2059 | 2206 | OO | 2003 | N783SK | 187 | 171 | -54 | -11 | IAH | BFL | 1428 | 5 | 11 | 0 | 0 | |
| 2011 | 8 | 29 | 1 | 935 | 1041 | OO | 2040 | N767SK | 186 | 169 | -54 | -10 | IAH | BFL | 1428 | 4 | 13 | 0 | 0 | |
| 2011 | 12 | 25 | 7 | 741 | 926 | OO | 4591 | N814SK | 165 | 147 | -57 | -4 | IAH | SLC | 1195 | 4 | 14 | 0 | 0 | |
| 2011 | 1 | 30 | 7 | 620 | 812 | OO | 4461 | N804SK | 172 | 156 | -49 | -10 | IAH | SLC | 1195 | 5 | 11 | 0 | 0 | |
| 2011 | 8 | 3 | 3 | 1741 | 1810 | XE | 2603 | N11107 | 89 | 73 | -40 | -19 | IAH | HOB | 501 | 5 | 11 | 0 | 0 | |
| 2011 | 8 | 4 | 4 | 930 | 1041 | OO | 1171 | N715SK | 191 | 177 | -49 | -10 | IAH | BFL | 1428 | 4 | 10 | 0 | 0 | |
| 2011 | 8 | 18 | 4 | 939 | 1043 | OO | 2001 | N783SK | 184 | 172 | -52 | -6 | IAH | BFL | 1428 | 4 | 8 | 0 | 0 |
Session info
sessionInfo()## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
## [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Switzerland.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] hflights_0.1 ggplot2_3.1.0 dplyr_0.8.0.1 gapminder_0.3.0
## [5] kableExtra_1.0.1 knitr_1.21
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 highr_0.7 plyr_1.8.4
## [4] pillar_1.3.1 compiler_3.5.2 prettydoc_0.2.1
## [7] tools_3.5.2 digest_0.6.18 gtable_0.2.0
## [10] evaluate_0.12 tibble_2.0.1 viridisLite_0.3.0
## [13] pkgconfig_2.0.2 rlang_0.3.1 cli_1.0.1
## [16] rstudioapi_0.9.0 yaml_2.2.0 xfun_0.4
## [19] withr_2.1.2 httr_1.4.0 stringr_1.4.0
## [22] xml2_1.2.0 hms_0.4.2 webshot_0.5.1
## [25] grid_3.5.2 tidyselect_0.2.5 glue_1.3.0
## [28] R6_2.4.0 fansi_0.4.0 rmarkdown_1.11
## [31] readr_1.3.1 purrr_0.3.0 magrittr_1.5
## [34] scales_1.0.0 htmltools_0.3.6 assertthat_0.2.0
## [37] rvest_0.3.2 colorspace_1.4-0 utf8_1.1.4
## [40] stringi_1.3.1 lazyeval_0.2.1 munsell_0.5.0
## [43] crayon_1.3.4